reinforcement learning - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

reinforcement learning

List of documents from internal workshops

Below is a draft for a 10-minute commentary at an internal study session

reinforcement learning

supervised learning

Input and Teacher Data

In Go, it's called notation.

Who's going to make the teacher data?

People.

I can't talk about ten cases, a hundred cases.

AlphaGo

160,000 games

28.4 million boards

57.0%

self competition

How many times?

state-value network

Take data from the results of the self-match.

Only one board is taken from each game.

30 million = 30 million games

---

This page is auto-translated from /nishio/強化学習 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.